\ 

ED 097 366 



DOCUMENT RESOME 

95 



TM OOa 001 



RUTHOP Miishkifl, Selma J. 

TjTLE V Proposal for a "SIR" Adjusted I* dex of Educational 

Coinpetence. ' 

.INSTITUTION Georqetown Univ., Washington, D.C. Public Services 

Lab. . 

SPONS AGENCY Office of Education (DHEW) , Washington, D.C. 

9EP0RT NO DHEW-OE-74-11 112 

PUB DATE Aug 7 3 

CONTRACT OEC-0-70-Ut|5«l 

NOTE • 60 p. X 

AVAILABLE FROH Superintendent of Documents, 0. S. Government 

Printing Office, Washington, D.C. 20402 ($1.00) 



EDPS' PRICE MF-$0.75 HC-$3.15 PLUS POSTAGE 

DESCRIPTORS *Academic Achievement; Achievement Tests; 

*co[Qparat ive Analysis; ^Demography; Education; 
Fquated Scores; *Evaluation Methods; Income; Policy 
Formation; Program Effectiveness; Race; Research 
Methodology; Research Problems; Sex Differences; 
* *Testirtg Problems; Testing Programs; Test Results; 

Universal EducatioB 
IDENTIFIERS^ Educational Outcomes; *SIR' Adjusted Index 

ABSTRACT 

The increasing use of educational performance or 
autcome measurements for a range of policy purposes points to new 

''procedures for adjusting data for population composition. The 
purposes include: program formulation, i budget resource allocation, 

.grantrin~aid designs, performance incentive payments^ consumer 
iitforroation fo;: school selection, and program evaluation and review. 
This- paper outlines methods for controlling populatiol^i differences to 
make da^a on performance more comparable across time and from place 
to place. The resulting estimates of achievement scores, standardized 
for population dif f eren(?es, are useful for comparison only. Such 
admparative indexes rembve thxi influence on average scores of 
population changes over time, or population differences between 
schools or school districts. Adjusted scores are not intended to take 
the place of the basic data but to complenent them. Standardization 
procedures can be applied to achievement test scores and to other 
measurements of competence? such as attitudes or attributes. In this 
report, achievement score adjustment is used as an example. The 
selection of sex, income, and race (SIR) as control variables is 
proposed as a first step, (Adthor/SE) 



DHEW Publication No. (QE) 74-11112 



A. PROPOSAL FOR A "SIR" ADJUSTED INDEX 

OF 

EDUCATIONAL COMPETENCE 



AUGUST 1973 



S t)l I'AHI VI NT OF Mt At tM 

I one A T lON 

. ... -M \ 1^1 
. ' . . „ ► \ . J ' wnV 

' \' »^ • ^ , P (f'N 

' . J '.V i. t' N , )*JS 

. ' ■ .^^ . M» t'WI 

* . *. . . », • V M ' » 



U:S.' DEPARTMENT OF HEALTHV EDUCATION, AND WELFARE 
Caspar W. Weinberger, Secretary 

Charles B. Saunders, Jr., Acting Assistant Secretary for Education 



Office of Education 
John Ottina, Commissioner 



T 

/ 

/ 



/ 

/ 



/ 



This report was prepared by Selma J. Mushkin, Director, 
Public Service Laboratory, Georgetown University, Washington, 
D.C.. pursuant to Contract No. OEC-0-70-4454 with the Office 
of Education, U.S. Department of Health, Education, and Welfare. 

Opinions expressed in the report are those of the author 
and do not necessarily represent Office of Education position 
or pjlicy. 

! 

"l 



U.S. GOVERNMENT PRINTING OFFICE 
WASHINGTON: 1974 



For soIp by l!jo Superintendent of Documents, V.S. Oovernment Printing Office. Waslilngtou. U.C 20403 • I'rico $t 



FOREWORD 



r ♦^a on educational outcomes are increasingly being sought to 
better Jer stand the results of re&ource commitment to education and to 
provid' information for parents teachers, and administrators in the 
quality of educational services. Has the output of schools ijicreased ^ 
over time? Are more children learning and- learning more? Is educational 
performance better in one community than another, in one school than 
another? 

Comparisons traditionally made between schools and between schoal 
districts could be improved by borrowing from the demographer's toolkit 
the notion of a reweighted arithitietic average. Standardization to compare 
population characteristics not under the schools' control is the kernel of 
Br. Mushkin's proposal, which is still, we realize, in an embryonic stage. 
The adjustments are only for group scores, such as averages, not for 
individual students' scores. 

Other methods, such as that developed by Henry Dyer for New York 
City schools, are -available for refining school achievement comparisons. 

The present report is distributed for comment or review as a part 
of an exploratory st'udy on educational outcomes supported by the National 
Center for Educational Statistics. Public Services Laboratory of 
Georgetown University proposed the initial draft of this report in 1971 
under contract with the U.S. ^Office of Educatioi). 



William Dorfman, Chief 
Statistical Systems Branch 



Do. thy M. Gilford 
Assistant Commissioner 
for Educational Statistics 



• - ACKNOWLEDG^leNTS 

Pi large tlt?bt is owed to those busy persons who gave so wllllrgly 
of their time in servini;._as. members of an ad hoc committee established 
to consider the "SIR" adjustments. It is hard to convey the author's 
deep pleasure and gratitude for the opportunity to work with this 
committee, whose memberk Included: Alfred Carlson, Education Testing 
Service; Willlgm Coffmart, Iowa Testing Programs; H. Russell Cort, General 
- Learning Corporation; Burton R. Fisher, University of Wisconsin; Nelson 
Noggle, Science Research Associates. 

To my colleagues at Georgetown, Henry S. Shryock and Jacob Slegel, 
far more! knowledgeable than the author about standardization of dedt'h 
rates for differences in age of population, gQ a special thanks for 
their patience as well as expertise-." Tiny errors, the author hastens to 
—add, dre ^er own and do not reflect on the valuable expert aid given. 

Research assistants at Public Services Laboratory (PSL) helped put 
together materials on testing. Thelf assistance has greatly Improved 
the document . Work was carried out over an extended period and , if 

' some students who aided are fiot mentioned, the author hopes they will 

■ understand. A special "thank you" goes to David Bain, Steve Stageberg, 
find Ellen Weissbrodt. 

Staff of the U.S. Office of Education participating in the ad hoc 
committee included Dorothy Gilford, Boyd Ladd, Ezra Gl aser,. Richard Berry, 
and William Dorfmnn ; they contributed greatly to the usefulness of the, 
discussions and to the cooperative spirit that all who engaged In this 
committee's work will long remember. 

Selma J. Mushkin 
Director, Public Services Laboratory 
Georgetown University 



ERIC 



.t 



CONTENTS 

Foreword iii 

Acknowledgments • • V 

Summary . ix 

Introduction. 1 

Purposes of Interjurisdictional and Intertemporal Comparison 

V 

/ school systems assuring equality of oppottunity? S 

What do children learn?. ^ ....6 

How are the schools to be held accountable for their ' 

perf ormance ?.. ♦ 6 

What is education's rore in socialj accounting? 8 

Status of Achievement Testing in State and Community . . .flk'. 10 

What portion of school children are now routinely testnd? 11 

How many different tests are given?.. 12 

At what grade levels are the achievement tests given? il3 

What subject matter is covered in the tests? 

What norms are used? ^ ....15 

What demographic information ih available for populatipn ^'^ 

standardization?. ^ * .V.i9 

Summary . . . . ; 22 

standardizing' for Sex, Income, and Race 23 

Why SIR adjustment?. 23 

"SIR" adjustment methods 7.28 

Toward data collection on outcomes 38 

Tent&tive findings 39 

References '^l 

Appendix 1 : Recommendations of an Ad Hoc Committee on Measurement 

of Educational Competence ^7 

Appertdix 2 : Characteristics of Se lected Stntewide lusting Programs 50 

Appendix 3: Basic Testing Programs in Major School Systems 52 



ERIC 



vii 



I 



Tables 

Pajge 

1. Reading Test Results, by County 27 

2. Hypothetfcai "SIR" Characteristics of ^ State 

and 2 School Districts '. ! .....30 

3. Population and Grade Equivalent Data: Standard 

nnd School District 1... 32 

I*. Unadjusted or "Crude" Grade i:quivalent Levels „ 

for Reading Tests, Grade 3..;.. • 33 

5. SIR Differences in Arizona - ■ 3*^ 

Chart • . j 

Mathematics Achievement Scores by Gratis Level Equivalent for ) . 

Pupils in Grades 6, 9, and 12 by Race and by Socioeconpmic 

Status of Parents •• 2U 

T 

r 



ERIC 



- viii - 



" SUMMARY 

The increasing use of educational performance or outcome 
measurements for a range of policy purposes points to new procedures 
for adjusting data for population composition. The purposes include : 
/ program formulation 

budget resource allocation 
grant-in-aid designs 
performance incentive payments 

— consumer information for school selection- .. ^ i,L— 

program evaluation and review . \ 
This paper outlines methods for controlling population dif- 
ferences to make data on performance more comparable across time and 



ERIC 



fl»om place to place. The demographer's tool of population standardiEa- 

. / 

tlons has been forged anew to meet the special problems of school per- 

foiroance . - 

The resulting estimates of achievement scores, standardized 
for population differences, are useful for comparison only . Such 
comparative indexes remove the influence on average scores of popula- 
tion changes over time, or population differences between schools or 
school districts. Adjusted scores are not intended to take the place 
of the basic data but to complement them. 

Standardization procedures can be applied to achievement ♦test 
scores find to other measurements of competence such as attitudes or 
attributes. In this report, achievement score adjustment is used 
as an example. The selection of sex, income, and race (SIR) as control 
variables is proposed as a first step. 

- ix - 



\ 

\ 

■ I \ 

A PROPOSAL FOR A ''SIR" ADJUSTED INDEX OF EDUCATIONAL COMPETENCE 

• i i \ ; . 

i 

INTRODUCTION ^ . _ : 

Achievement test scores are now widely used as measures of 
educational performance and outcomes. There are, hbweverV problems 
in inteiipreting the scores of student populations with different 
^emograpliic cheraoterlstlcs. For some purposes^ such as measures of 
pupil progress. Individual scores ma;^ be useful. For other purposes, 
such as comparing the performance of educational units such as schools 
or school districts, they can be misleading. - 

f 

Direct comparisons of the performance of different schools^ or 
school districts become, essentially, comparisons of incomparables 
l&hen pupil populStlons differ in demographic, socioeconomic, or j 
"cultural" characteristics. Adjustment of school population scores 
for differences in sex. Income, race, and age--when appropriate— 
would reduce the bias in Intertemporal and interjurisdictional com- 
parisons. 

School districts are now comparing one school ^s performance to 
another ^s. The scores of pupils in particular grades in a city are 
compared with national grade equivalent norms . At the time this report 
was being proposed, for example; the Chicago press In .June 1971 head- 
lined "Chicago's Pupils Get Poor Test Grades. . . . The Citywide norm 

(1) 

-of J2~falls 18 points below the national norm of 50." The Size and 
direction of the gaps between the city mean scores and national norms 
curreVitly define the quality of a city's schools. State systems, 
moreover, ere making dollars per pupil and other comparisons between 



school districts the traditional, index of input allocations and reading 
or mathematics achievement 8001*68 as indicators Qf educational output. 

In reality, because of the different student population 
characteristics associated with separate educational units, it Is 
increasingly difficult to judge from test score medians whether the 
school or school district performance is In fact lower in one place 
than in ^another, and whether an improvement has been made from one 
period to another. Changes in student population composition could 
introduce variations in statistical results which have nothing to do 
Iwlth differences In the quality of education. It may be unwarranted 
to assume, for instance, that scores that were lower orie year^than 
the previous year necessarily reflect ^alitative deterioration in 
the educational program. In some instances, favorable quantitative 
results do not really indicate a concurrent gain in quality. Real 
changes in results may be obscured by changes in the characteristics 
of the population; even though there is no change --iiVHjverall' test 
results there may actually be qualitative improvement in some 
instances, and decline in others. 

Differences and changes in demographic characteristics of 
States, school districts, or Schools may create statistical B?;*tlfactS 
rather thaii depij?t^real trends; achievement score data at any particu- 
lar time may show interstate or school district variations that may or 
may not indicate comparative ^learning" achieved. Even in two neigh ^ 
boring schools--one with only girls in attendance, the other with only 
boys — median or mean test scores could differ; yet each school might 
have scores equivalent to the national median or mean for each sex. 

- 2 - 



Ill the past, when children with one set of characteristics have 

not achieved as well as otl\ers on standardized tests, one of tliree 

courses of action lias been pursued: (1) attempts have been made to 

reduce differences between suhpopulation groups by removing bias in 

tests; (2) multiple rather than single tests and test norms have been 

developed; or (3) testing has been stopped. F-arly in the history of 

intelligence testing (IQ tests Questions were screened for differences 

in the response of girls and boy^ ; if significant differences were found 

in responses to particular questions, items were deleted from the test. 

This suggests a route that might have been followed in achievement 

testing when any question or group of questions elicited significant 

differences between blacks and whites or other groups . Research 

findings that the origin of much difference is cultural -linguistic 

(2) 

indicate correction of tests is in order. 

Through the National Assessment of Educational Progress, edu- 
cational materials are being prepared that can perhaps break from the 
traditional white middle-class male biases in testing instruments.' 
Exercises dealing with black culture, black history, and black liter- 
ature are included. The testing exercises are obiective or criteria 
referenced and are not normed. And material is presented so as not 
to compound difficulties; for example, tests not intended to measure 
reading comprehension are administered orally. Similarly, ex -dses 
are being examined for possible sex bias; for example in content of 
science questions. Such efforts to rid tests of cultural and other 
bias may be a step toward more accurate assessment of achievement. 

A second course is to develop separate tests or use separate 
norms for differing groups. Separate nor<ms tor girls and boys have 



becMi us(?(J in't?r years* but different norms for blacks and whites 

have i»omo to be applied onl' recently and then only ii connection with 
use uT test sc6re?s for col leij;(? selection. >lLiltiple norms also liave been 
developed. lor example, iiorrns liave been set for bis-city school systems 
to relate scores for one core city to those for anotlier, with big-city 
eqiu val tr»i ts «j;ci'eral 1 \ 2S percciit below iKitioniil norms. Such normfe add 
.considerable deptli to our uMdorstanffing of achievements by. school or 
school distriit. nnd provide more reliable statistic^al yardsticks with 
which to tneasure comparative prof^res*. Miile tfie measurement of intra- 
city n: d i; tercitx differences and rates of progress are improved, 
differei CCS ai:<l (»lian;-;es in the underlying characteristics of tlie 
school populatio:^ can obscure the m^anin^ of establisfied i;ornis. for 
j example, tlie mea! inij; of test scores is affected by selective migration 

to cities, particularly for smaller school districts with special 
(character [sties of "movers" and "stayers" in the student population. 

U'liat t\pe of statistic would facilitate achievement score and 
other competence comparisons across time and across jurisdictions 
wll hoiit addiii^^ to the dispTa\' overload? The remai ni nj^sections of 
tins paper address this questioi;. * 

PikIN)Si;s or I\TI;IM! KISDK TION.M, and INTLRTIlMPDHAh (:(/MPARTSf)N 

\\v ask t irst. what are the purposes- of statistical comparisons 
ol* outco:;.es V V 

A f umber of dill'eriiH^ purposes are propellin^j; the Federal 
(JuvcrMiicMt. States, aid school distrif^ts to assess ach i(?vcfnen t in tlu» 
scxiiools . Ill turn, those assessinents are alterinin structures and 
policies i: (»!cmc: tar\ and sec^ondary education. 

er|c - I, - / ^ 



1 



Are school systems assuriim equality of opp ortimit:> V 

' The survey of Kquality of i:ducatioiial Opportuiiit:y f'Ttie Coleman 
Report"), carried out by the U.S. office of Lducatlon under the Civil 
Rigl)ts Act of 1904, sharpened the focus on school "results." Varl^- 
ous measures of results identified in the context of rarial eqiiality 
Include: (a) oceupat'onal status and mobility, (b) years of schooling^,. 

and (c) iucome. •» 

The Coleman study'was on a sample of 504,000 children in 

grades 1. 3, 0, 9, and 12. The cliildren were tested on verbal ability, 
nonverbal intelligence, reading comprehension, mathematics, and general 
material including practical arts, humanities, natural science, and 
social science. These intermediate educational achievements were con- 
sidered necessary to ultimate occupational, educational, and income 
attainments. JTniform tests were given to all groups with tlie resultant 
familiar findings: the average performance of minority pupils, except 
thp Oriental group, was signifi ca.^^ly below the average for white stu- 
dents; school inputs apparently compensated little for handicaps in 
home and community environments. 

- The President^ in his March 1970 statement on Klementary and 
Seqpndary School Desegregation called anew for equal educational op- 
portunity. In recommending added funds, the President said. 

I am not <'ontcnt simply to sec this money spent, and 
then count the spending as a measure oi ac-compl isliment . 
For much too long, national "commitments" have been 
measured by the number of Federal dollars spent rather 
than by more valid measures sucli as tlie quality of 
imagination displayed, the amount of private energy 
enlisted or, even more to the point, the results 
achieved. (4) 



ERIC 



- s - 



IVIint' do <'lii I (Iron learn? . . j 

Anotlier question posod by the President, in his March IQ?)} 
rn€}ssn<.!;c on ediu'ationnl reform, ^^ave new emphasis to tlie outcome of 
sohoolinjj; and new measures of a(Muevoment. He proposed that a 

N'atioi^aL Institute of llducation ,take the lead in developing new 
muasuremei: ts of educational output. "NIK . . . yyould develop dri- 
teria nic.isures for enablins^ localities to assess educational' achieve- 
/:ncnt nnd for eyaluatin^^ particular educational programs, , . . In 
iloiii'^!, so. it should pay as much heed to what are calJ.ed the ^immeas- 
urnhU»s' of schooiinjL? (largely because no one has yet learned to meas- 
ure them) su(»h as responsibility, wit^ and humanity as it does to 
verbal and mathematical achievement.'^ 

Subsidiary but related quesf^^rions ask: \\Tiat do children learn 

\ 

compared to what they could be learning? Are children learning more 
tiow tlian children did years ago? \ 



\ 

\ 



flow are the se^hools to be held accoun tabl\? for their performance ? 

IVitliin a surprisingly sliort period of time the concept of 

\ " , 

"school outca^mes'' hns come to be applied as administrative rrrdasures of 

\ t. 

pvr'oniiance of schools, teachers, school districts, and so forth. 
Ac( ountabi lity !ias come to be a part of current practice grounded in 

t h(* c\ ,^1 not ioi! provision of Title I of the 111 e^^ientary and Secondary 

\ 

Kducatlon Act (l.Sf.A) nnd cMiccuras^cu) further by ru\w programs such as Right 
to Read. 

The President's Marc^h 1970 education messa'';e noted: "Scfiool 
adfiu'? istrators and S(^hool teac^hers alike are responsible lor their 
per t orma? c'(» . ai d it is in their interest as well as in the interests 
cl tncir^ i)iipils that t hey be hf»ld nccouii tabl c» .... Success sliouid ^ 



ERIC 



be- mt-'asurod not iKy some l ixed national I'orm, hut rather hy the results 
achieved In relation to the actual situa^tioii ol" the particular' school 

en • 

and -the particular set of pupils." • hater, in his 1974 budyet mes- 

Sii^j,G, the President set tlie pattern of jj;overntnent-wide rospoiisibili ty 

lor proj^ram performan.^'e . I'rosiroms will be evaluated to identify those 

that must be redirected, reduced, or eLimiiiatid because they do not 

lustitV tlie taxes required to pay. lor tliem. l ederal pro\;rains must 

' (S) 
meet their nbiectlves, and costs must be related to achievements. 

' News reports on how ,y;ood a job a school does are a direct con- 
sequence ol evaluation in school districts. Ulien programs are evalu- 
ated, results have to be made clear and simple, evaluation oT a pro- 
j^ram requires a clear statement ol purpose and a measure or measures 
that quantity the essential cliaracter ol that purpose. Many States 
have applied manaj^ement by objectives to education, along with cost- 
benefit principles embodied in some form of planning -programming- 
budgeting system (PI'HS) , and Iiave tied statewide educational assess- 
ment into such a system. New York is reported by IITS to employ an 
adaption oi PlM'-S--l'rogram Analysis and I^eview O'^IO— to designate 

educational problem areas directly applicable to the State's ESEA 

fh) 

programs. California also has been developing a I'l'H system. 

Program evaluations have heightened interest in concepts of 
program outputs and in data that can illuminate those concepts. 
Acliievement testing hy scliools and scliool districts has been encour- 
aged by evaluation require.nents and some States have conducted state- 
wide testing. 



* fducational Testing Service 



Aoli iuviM'UMiL LusLin!^ Is only ouu uT many measures that might be 

made of oom|)e tence^^ aLTectLve and co,u;intlve— -created through 

educatLuii. Thc! emphasis on achit '/c??neu t LestLng, and in particular 

on reading scores^ represents an carLy and undoubtedly too simple 

response to the need within a program evaluation to equate measure 

"to purpose, reading' F;corc-1:o nchievcmr^nt level, test result 

(7) 

educational outcotne. 

1am 1 nation and accountability have spawned still another 
si)eci.es — the ''performance contract'' — which pays contractors, teachers, 
or students according to student |)erformance. A whole new area of 

contract purcliases lor student learning permits industry to serve the . 
sclipols by design Lng learning Instruments,, curriculum materials and 
the Like. I'axinents according to performance have sharpened concern 
about outcome measurements; the process of evaluating performance has 
made it \nimis takably plain how little Ls known about educational out- 
comas iind about ways of acliieving educational performance. 

r 

What is education's role in social accounting? 

Social accounting, similar in concept to (iNP accounting, has 
received mucli attention. Development of human capacity is so much a 
l)art of well-being that a measure of education is necessarily a 
central variable In any index for social account ihg. As a step toward 
charting the Nation's social progress, a social report was developed 

in l/J(;8 as a trial effort to ^'examine the qualitative condition of 

(0) / . ^ 

society re:4ularly and comprehensively.^^ The report emphasized 

the need for a national assessment of educational acliicvement . That 

assessment, now undenN/ay through the Kducatloi Commission of the States 

and the National Center for liducatlonaL Statistics, is beginning to 



provide Nationnl data on tlu' ni'hiovemcnt of spectflcd ob joctivos . 

. Toward a Social Ueport: ott'd that The Divtost of llducational 
Statistics foiitfliii« ovor a hundred pa^^es of cducatioi-aL statistics In 
each aniiiiai issue. \ot has virtually ' O information on how much chil- 
dren have learned. The lormer report measured educational progress by 
indicators of equal opportunity such as relative positions in society 
and of society's enri<.'luiient by learning. 

AWns;"tho~m'easurements oT equality (a) clianj^es iii OTTcn- 

pational patterns. (b) years of stjhooling (;ompleted, (c) talentless 
(percenta^jre of persons who graduate from liigh school but do not .^o on 
to college), and (d) intergenerational upward mobility. 

.\nion« the measurements of enridiment were: (a) years of 
schooling, (b) rates of functional illiteracy. (c) school perfor- 
mance changes over time (using standardized test scores such as tlie 
PSAT and SAT scores and professional test score results), and (d) (-lose- 
ness between black and wliite achievement test scores (Coleman study). 

The National Goals Resear.^h Staff's (NfJRS) 1971 report to the ^^^^ 
President did not undertake a second round toward a social accountins; 
it did note ongoing work in the Office of Management and Budget (OMB) 
to improve measurement of the "domestic tiealth" of tlie Nation. Among 
the data assembled by tlie NCRS to open emerging issues for discussions 
were; (1) enrollments over time by level of education, (2) years of 
schooling completed by population coliort ages 3S to 39. and (3) voter 
responses to schools, as evidenced by public school bond election re- 
sults. OMB is preparing a publication on so(;ial indicators that groups 

existing educational data under two social concerns: basic skills for 

( X 1 ^ 

everyone, and opportu;ii t'. for advatic^ed learning. 



* Preliminary Scholastic Aptitude Test 
** Scholastic Aptitude Test 



- •) - 



teinporftl cofiiparisoiis causecl reccMit works on socvinl indicators to 

STATI s 01' At:![fi:\/i;Mj:\T|f'i:sTi\(; w stati: and commlnitn 

Tlio iiifin?asiii!^ iiiiportarico oT knowledge about educatioiial out- 
comes ii! policy tormulation and decision-makiny has created a risiiig 
deniand lor measures that can provide |:hat knowledge, 

Uliat kinds ol information on educational achievements are now 
available that t^ould be (collected by NCfS from schools, scJiool dis- 
tricts, or States? The answers to six related questions would deter- 
mine whether test score data can be collected without new surveys: 

1. U'hat portion of scliool (^hildren are now routinely tested? 

2. how many di I f erent tests are s^iven? 

S. At v>;liat ^rade levels are tlie achievement tests jL^iven? 

4. Ivhat subject matter is covered j[n the tests? 

5. V.liat norms are used? 

hliat dcmo:j;raplii c iriformation is available for standard!- 

zatioi? 

I iiformat ioi^ address i m^j; these questions is drawn I'rom several 

(12) 

sources: "^wvvvy^ mad(> b\^ the hducationfvl Testifi}^ Service in 1908 
and 1*J/1: th(> Akron IHibl ic S(;liool Survey in April of basic 

testing pro'j;rams used in maior school systems tlirou^hout tlie I'nited 

in) 

staters; and the 1^)70 surve>v of the Kesear^^h (Council oi. fireater 

(Mn 

i i i'\ School s . 



- 10 ^ 



Th(» r/rS st ii(lu»s iwv. s[)ut'itiiui I ly ooiiceriiod with Stnro testins»; 

programs, lierinocl as any or}j;niiizucl , roortliiinted , control ized effort 

''j 

by i\ Stnte In provide sonio typo t)f te^st materials oi' services. The 
definition, however, ineludes States furnishinj^ every eonceivable 
seiviec associated with testinj.^ and States that merely ofTer assist- 
ance in de»vel opin^j; or imi)rovinj^ local testiiij^ programs. 

VVliat portion of school children are now routinely tested ? 

Hdiicat iu-ial testing; in the States has been encouraged by Federal 
requirements Tor evaluation. The growth has been accelerating. Infor- 
mational matc»rlnls i'or the CTS 1967 study were submitted by\50 State 
departments of education and a selected group of colleges and univer- 
sities. In responses for that year 42 State departments reported 
testing programs; eight states indicated no programs. Most of the 
programs were intended principally for guidance of students. Only 17 
States were using tests to helj) evaluate instruction and only 13 to 
assess student i)rogress . 

1. At least 2 million pupils in 10 states are tested annually 
by i\t U»ast one of these five tests: Californin Achievement 
Test, Stanford Achievement Test, Iowa Tost for Basic Skills, 
Metropolitan Achievement Test, and Science Research Associates 
Achic»vefiuMit Tests . 

2. Almost C million additional pupils are tested in extensive 
State testing programs in other Stdtcs. 

The rcMHMit requirt>ment lor program evaluation under Title I of 
the i;iementary .nul Secondary Educat ion A(»t has grtsitly increased the use 
of achievcfnent tt»sts in States and communities (with in some instances 
separate ri»p<>i't i^ig '^V -^cx, r<ynily income, and race* oT pupils). Tlie 



newer Ki^hl tn Rciid |>t'us;rnm has also stjrnuUitcd ncli i ovomcDt moasiircmcnt . 
There has been i Dcroasiiijj; I'onccjrn over the kinds of moauirable pupil 
loaririii-j; .ind riovo I opinoiit v.-hioli State odiiont ioi'ui 1 tax dollars are biiyin};. 
According to the 1971 i:rs oompi 1 utioii ol' State Ldueational Assessment 
Frofirams, every State had eoiiducted a needs assessment program, was 
eurrontly doitr^o, or planned to reeyeU- a completed one. The universal 
use ol surh proHr.ims, KTS lelt, was explained by the requirement of 
soetion W2 ol" IISCA, Title 111, which tied needs assessment to the 
receipt cl Federal funds. 

In addition to the individual State i)rograms, 27 States' had 
participated in j)! annins the Belmont System, a comprehensive educational 
evaluation system developed with the cooperation ol the I'.S. Office of 
Education to help consolidate and improve State reportinj^ required by 
law under several federal aid projLjrams . 

Many States are emphasizinji; the formulation of statewide 
educational goals, in recognition that such a set of goals is an 
essential charac^teristic , if not prerequisite, of an educational 
assc!sstiieiit program . 

How many different tests are given ? 

The CTS sui'vey on State' testing programs reported achievement 
testing batteries in 27 States in 1907 as follows: 

Di fferent 
I'rograms States Instruments 

AchieveincMit batteries 3H 27 21 

The I;TS survey gave tlie type of tests and the number ol" States ' 
in which each testing instrument is used. Most of the children tested 
annually .ire tested under local programs. The Inl lowing figures were 

- 12 - 



computed from tlu' l')7l IITS survcv of, Stntc oducatioiuil ' pixj^r.-ims : 

/ 

Tests i Programs States 



hnva Tosts of I'.diK.Mt i oiia I Dp.vo I opincMit (JTKJ)) 

StanlDrd Ach ii'vctnciit Tests (STAT) 

SeqiuMitinl Tests of r.ducatioii Progress (STHP) 

Cal i forn ia Aeli.ieveinent Jests (CAT) 

Iowa Tests of Basie Skills (ITBS) 

Metronolitnii Achievement Tests (MAT) 

Science Research Associates Achievement Series 



11 
9 

5 
5 
7 
2 

5 
44 



11 
8 
5 
5 
6 



2 



42 



At what szrade levels are the achievement tests 5a:iven ? 

Achievement testing is usually done by grade level . Mental 
(15)' 

Measurements Yearbook reports possible ranges : 



Iowa Tests of IkUicntionnl Development (ITKD) 
Stanford Acliievement Tests (S1»AT) 



Sequential Tests of education Progress (STi;P) 



Californin Achievement Tests (CAT) 



I(,wa Tests of Basic Skills (ITBS) 



Grades 

9- 12* 

1 .5-2.4, 
2.5-3.9, 
4.0-5.4, 
5.5-6.9, 
7.0-9.0 

4-fi* 
7-9 

10- 12 
13-14 

1.5-2.0 

2-4 

4-G 

6-9 

9-12 

3, 4, 5,* 
(), 7, H-9 



EMC 



*Only in 1965 volume of Mental Measurem ent Yearbook 

- 13 - 



(Jrados 

Mutropol iti'in Acli k'vomicm) t Tests (MAT) K-l.U 

7.0-9.5 

SeioiU'L» ^^GscniiuMi Assoriatcs (SRA) 1-2 

2-l| 

" 3-U 

Some* Status, notably >LLc:li.i^^ini iuicl pLMiiKsy.l.vaiila , have? set in . ^ 
iiiotion pr()}j;rimis of statewide testing in several subject matter areas 
uikI ottuM's, sLirh us t!oloraclo and DtHnwarc?, arc? iiioviii^ in that direction. 
Somo Stntc>s are? stnrtin.t; unit testing for n grade lcvt>l and others 
restrict the? tests to readinjj;. 

As ol 1/J7 I, l or s!;rado s^i'oupinsiis , the I'oJ lowing numbers of States 
reported testinj^ |)rosi;r\jfns : 

HiMde I eve I s tested Ni unber of States testing 

U-O 2U 
7-9 22 
10-12 22 ^ 

Sp(?(* in! eh.u'acter ist ios oJ' State testinjL?; |)r(J!L!;ranis iin? suniniarlzed 
in Appendix 2. Hatteri(?s of tests arc? uftcMi j^iven in selected largo 
Cities. A 1.970 surv(»y by the Research Councii on fireater City Schools 
iXM)ort(?d testinjj; in 100 nui jor city school systems. findings of an Akron 
study on tests by grade* J eve I arc* shown in Appendix 3. 

IVliat subject matter is (H)vere(l in the tests ? 

The* subject matter varies by grade and test. Word meanings ^ 
vocabu I -iry , I'cadi ng cfHiipcchcMis ion , and ari tlurct i o c(Hni)utation are among 

c 

O - lU - 

ERIC 



thu subicH'ts mi^sl ol ten iiicliuUHl. 

,Mthou|u;h tlio lUMitrril purposo In most Stntus is to assess the 
rosi,iiitivu di»vfl ()i)mcMit ot sIiuUmiIs, n t (?vv Stiiti^s iwv ho^iwuin^ to stress 

personal -sorin I (l(»vc I opmiMit as wuIK Peonsy I vania includes attitudes 

..... . • 

^ and nuncos;ni tive abilitios that it has set as !)art of the schools* 
purposos : amoiiju; tlic tosts art* niensuros of soJ f -concept , understanding 
of others, clt i/(Mish.4) , creativity, health habits, rendiness for 
rhansj;c, and attitudes toward the school- Michijz;an has measured 
attitudes toward learning, achievinjz;, and sell*. 

IVliat norms are uscul V 

Various types ot* norms developed by test publisliers are being 
a[)plied h\ coriipnr i snns 61' schciols, school districts, and so forth, 
nationally standardized tests establ ish norms from responses to the 
tested fiiaterial by a national sample of the school population. 

Norms l or t(»sts are developed on' the basis of either raw 
scorc?s--(^ounts eorrect answf»rs--or derived scores--sets oT values 
which deseribe th(» test pi^riormance by some 8|)ei-ified sroup, usually 
sliown as a table vi;ivin;j; equivalent values of some deriv(?d score for 
each raw sc()r(» on the test, with the |)urf)Ose of: 

1. ^'lakin|u; scores I'rom dirfertMit tests comparable by 
expressing/ thcni on the same si^aJc?, and/or 

2. rnakin,^ possible more ineanin^TuI ]nt(»r|)retations of 
s(U)res . 

* 

Typ(»s of deprived s(mhh»s l or i nt(?rindi v Idual comparison in 
standard i/ed ach i evcMiient tests In^^l iitle : 



« IS - 



A. Trniisrormatloiis biased on the mean and standard devia- 
tion of the scores tor the group (linear standard scores): 

^ (1) /-scores* 

(2) AGi:T-type scores (Army General Classification Test) 

(3) CI:i;B scores (College Entrance Examination Board) 

B. Transformations based on relative position within group: 

(1) " Rank 

(2) Percentile ranks and percentile bands 

(3) Stanines ; 

(4) T-s?ores^* 

(5) Normalized standard scores 

(!. Conlsideration of the proportion of possible test scores: 

(1); Percent placement 
D. Consideration of the status of those obtaining same score: 

(1) Age scores 

(2) Grade-placement scores 

For most tests, publishers provide national norms; for some, 
regional norms are available; and in those States in which State 
testing has been done for some time — New York, Alabama, California, 
Iowa, Rhode Island, Minnesota, Pennsylvania, Michigan — statewide 
norms are available. % 



Problems of variations and biases in norms, Roger Iicnnon^s discus- 



sioi of norms in a 1963 KTS ^)aper notes: 

There are good reasons for supposing that differences in< 
norms ascribable simply to . . . variations ir. norming pro- 
cedures are not negligible. Wien we consider that to such ., 
differences from test to test there must be added -differences 



* /-score or transformed standard score is a modified standard 
S(-»or^ developed to avoid decimals arid negatives. 

T-score or iiormalized staiidard score is a score that would l>e 
equivalent to the score if the distribution had been normal. 



ERIC 



- 10 - 



associated with varyiiiij; coi'.teiit, and with the time at which 
standardization programs arc coiiductGd (includiny; the time 
of the school year), the issue of comparability, or lack of 
it, antui.jj; tlic results of the various tests may bejj;in to be 
seen in proper perspective. Ilmpirioal data reveal that 
there may be variations of as much as a year and a half in 
^vn6v equivalent among the results yielded by various 
achievement tests; variations of as much as 8 to 10 points 
of 10 among various intelligence tests are, of course, by 
no means uncommon. (IG) 

r.xisti -g norms, in addition to being often outdated, suffer from 
samples which may differ systematically from the National popula- 
tion ! 

4 

!i; data }.^atherin^ , norm biases are particularly important 
&i-itce the\ distort sii^sl^^time and longitudinal comparisons among 
schools/ school districts, and States. The reduction of norm bias 
is critical because these are the kinds of comparisons most often 
sought. 

T ranslation of different test scores . NCi:S has completed an ANCHOR 
Test study to develop score correspondence among thv seven most used 
reading tests (with an eighth being developed). Score correspondence 
is essential to any n^itionwido data collcctio:i effort that leaves to 
the local community and State the initial dejision on what children 
should learn and are learning--and the testing instruments to assess 
that learning. 

A feasibility sur\'ev' was launched jn I'JfiO on reading compre- 
hension subtests for the iive most widely used standardized test bat- 
teries, appropriate tor grades four, five, and six. Tlie reading com- 
prohelision subtests of the Metropolitan Achievement Test, Stanford 
Achleveniert I'est, iowa Test of Basic Skills, SRA Achievement Series, 



- 17 - 



ERIC 



and the Sequential Tests oF educational Pcpgress were administered to 
son\». 830 v-phildren, each completing subtests from three batteries 
arranged in ^random order. (Correlation coefficients amonj^ the five 
subtests wfere high: the lowest (for groups of grade four pupils) 
was p. 81 and the highest (for the same grade) was 0.91. 

In another feasibility study mathematics test scores could not 
be translated from one test to another . 

IJased on the results of the reading feasibility survey, a 
major test-equating and standardization study was conducted- The • 
number oi' tests for which correspondence -was sought was expanded 
among test instruments. Tlie purposes of the study were: 

1. to set up nationally representative norms for reading 
comprehension, vocabulary, and total reading scores for 
the most widely used form of the 1970 version of the 
Metropolitan Achievement Test, at levels appropriate 
for grades 4, 5, and 6; 

2. to develop tables of score correspondence fcetween the 
reading portion of the Metropolitan and corresponding 
subtests of other test batteries; 

3. to formulate new, nationally representative norms for 
the reading comprehension and vocabulary subtests of 
the other reading tests (5 or 6) ; and 

H. to estimate parallel-form reliabilities for reading 
comprehension and vocabulary subtests of the test 
patterns . 

Criterion -referenced testing . A move away from norm setting lias 
come .to be urged, partly because of minority group reaction, partly 
because of the general overall policy uses of testing results, and 
partly as a consequence of the increasing acceptance of the National 
Assessment of educational Progress. Criterion- or ob jectl ve~referenced 
tests are de5?i:j;nod to measure performance ot^ (Nearly stated obiecti/es 



^ - 18 - 



/ 



t'Knt idc^^tir S[)iM' ! t ird skills i: n pst r * " ^ • ; ! .if suTMrt "inlLtW' •M'c.k 
>ri)t;V'S o s \r \ tests slun*; tiu* t m! rorx'i'i : r^c^spo scs n 'cnrH^tH 

Liic p.iiMH»s ut i tiiiMMntio i dcM t j | i "ns f iipor la t to k ow " it n(M'()r(l 
with tlie ido tified obicutives. I^ospite the :j;rouM"p'j; '//ccptniuu* ol 
e»rj torioi!-rGf orecced inatrumeiits , problems rnmnit' of S'umnari ziiiij; 
results ot" percent correct, responses wlien tests i':^ difrcrent subjoct 
matters have a varied raii^e of questions (eosv to fiard) nr^d when. 

(17) 

questions within one testin.^ instrumei.t are of unequal difficulty. 

What demo<;raphic information is available for population standardization 

The development of comparative indexes of achievemei t requires 
that information be available on demo^raphi(^ (^fiara(^teristi(\s , at least 
sex, income or parentnT education^, rac^e, and possibly ^j.'ie J'or enc^h 
pupil for wliom achievement scores are available. Data are required on 
those variables that approximate the direct STR (sex, income, and race) 
data for the school unit population ol' (children tested. 

The avaiiabiliTrx' of information on achi ev(»men. t tests ar d on SIR 
suggests that tor the /articular achievement tests ^iven: (1) sex and 
n^e data are probably available tor almost all States and communities: 
(2) race is somewhat less likely to be available: and {i) iwvoiue data 
(of varying qualities) arc available in some States n^rj communities but 
not in most others . 

Data on income caniiot be obtained directl\' or easily from 
studr '^s or teacliers, and inquiries to parents may be misinterpreted 
as pryinjL^ by school officials. ()ccupational categorv or property 
values in tlie sch.ool ifeicj;hborhood mijJiht be used as indirect inc^onie 
indicators, as could educational or oc^f^ipat ional status of parents. 



- I/) - 



liitcrinl KfviMiic s t n I i s I i t'S on iiuainu* lor n small area rould be 
rt'lr'dcd to I'r: 8US trnt t' data nnd, tun:, to R(»liool distri(^t data. 
Keceiitlx • ill CH)Miifvt ion with Kgvgihk! Sliarinjj; ;i(ln) i 11 i sti^a t i on , tax 
tonus wt»r(» afncMdt?d to permit rout.ino im^omo data tabu I nt ion; iuid 
data ii'it eh i pilot proie^ots are neodod to assure (Convertibility 
TroMi l oiisus to Intoriial Revenue sourres . And eonsideration might be 
ji;ive!! to sueh indiret^t measurements as an area-wide socvioeoonomie 

status soore, based on. occnipation, eduoation, and income, wl^pch was 

I 

devel(»pe:l in 19(^0 by the Census 'lUireau. 

Census traot data from tlie 1909-70 Census of Population are 
available on rai'e, sex, and fartuly iijrome. Tlie NCKS-sponsored pro- 
lect tor- mapping C(!nsus tracts witli IILSLCIS-^' districts facilitates 
the matehinij; of school li stri ct -data on achievc^ment with a wide 
ranjj;e of Census socioeconomic inlomiatjon; however, obtaining' similar 
data witliin school districts poses a maior problem. 

I'or selected time periods, a national analysis is practicable 
f or fj-h i ev(Mncnt sc^ores and clianges shov/in^ separately achievement 
data In sex, by^^ome, by race, and by other (.characteristics. 
Suurrt^s of such nntionnl data include some 20 surveys now being com- 
piled Cor \Ci;S. however, the data are too sparse to warrant adjust- 
ment lor populatioi: subj^roups. Only as added inTormStion becomes , 
availaljle over tlie years tliroujj;h routine tact jjjatlierinjj; are summary^ 
statisticni methods su(Mi as SIR Adjusted Index indicated. 

National Assessmciit materials iKHHiminjj; available show dif- 
tf»rv'» ('es ii achievement S(H)res sex, rnt^e, and economic status 



i;ieinentar\' and SectJi dar\' education (ieneral Inlormation Survey. 

- 20 - . . . 



(awS MifHS'ifcd bv t"lu» highest iHliicatioiipl lovol arhievod by eiftit*^-_ 
partfut) . Tlie purpose ut t\\v nssossfnoiit is to provM'dc information 
al)out nro'j;ross ii! the acMifevoment of ediKs^tional obiectives. In 
desij^ninsi; thip pro;j;rnin. ob'ioctives and corresponding test exercises 
were carefully reviewed by scholars, educators, and lay citizens. 
The testln^^ instruments already completed or scheduled provide sam- 
ples of exercises appropriate for four age groups (9, 13, 17, and 
2f)-35) in 10 sub jects : science, reading, writing, citizenship, art, 
career and occupational development, literature, mathematics, mu6ic, 
and social studies. 

' The exercises measure knowledge, skills, and attitudes of 
groups, ratlier than individuals, according to their: (1) age level; 
(2) size and type of community (extreme inner city, inner city fringe, 
extreme affluent suburb, suburban fringe, medium city, extreme rural, 
and small cities); (3) four geographic regions of U.S.; (4) socio- 
education'al levels; (5' race; and (b) sex. Thus, the dssessment 
data provide the framework for required adjustments over time and 
indexes based on a standard population. 

National Assessment exercises differ from the standardized 
tests in that the goal is estimates of group rntlier thnn individual 
perFortnanoe . No i.idividual answers all the questions in each testing 
instrument; dilTorent groups of questio.is are administered to dltferent 
samples of the population, as in public opinion polls. The intent of 
the summary statistics draw Irom the assessment is to indicate what 
percentage of the population or of population subgroups can answer 
specific quustioiis according to the predetermined crit(?ria. 

- L'l - 

ERIC 



AtKlil ioiuil sourucjs of data Lncluclo: 

(a)^ Project TAIillNT, cai^rlcHl out initially Ln as a NatLonar 

>;ai:i)|(» survi'v of IkiII a i!iill.ii)ii hi;'.h slMkh) I. s tudtMi ts (:j;radus to .Lii), 

trioasurecl liiunan talcMits with a st^ivLus of specM'aLly constnictGd and 

tostud nic^asiirumcMit Ins tn^mien ts . Tnl.onnatLon was obtaLnod on sucli 

student cliaract(»ri sties as LnconiG and sux. Race was only reported as 
r 

a.sohooJ cliaraeteristLc so that data on aohLevement scores and race 
cannot /no tied direct I y« 

(h) The Health Examination Survey of L0(H-G5 included in its 
second cycle a f).)-mLnute test battery to assess mental aspects of 
'j;rowth and development of (3 to 11 year olds. The Read in;^ and Arith- 
metic subtests oL' the Wide Ran^^o Achievement Test were i^iven to 
measure school acliieyemen t. FLndini^s in their raw score lorm have 
been presented by afj;( , ^j;rado, and sex, ' (irado equivalents, percentile 

ranks, and standard score equivalents ot the raw scores are also 

^ ,(19) 
[)resentecU 

(0) NCi;S study on read Lnij; test measurements in grades four, 
rive, and six wiJl provide, in addition to test score correspondence, 
data on race and sex of resi^ondents rnid their t amily income (as judj^jed 
ijy the clcissroom teaclier). 

Sumni.iry o 

Our review fM' exLsti,n;j; school tes t infj;- -and data I'or ad:]ustinj.^ 
statistical, summaries lor sex, Income, and race — show that: 

(1) [n some States or Cf)mmuni t les , achievement score data 
are available lor sei.ected ;.;rade levels and a bejj; Inninjj; is beinjj; made 
f)n' nC' surement$ oi* other competencies. tn a smalJ.er number nl' States, 
data t»n iiujr)mej sex, and race are also availabLe. 



(c') lor n.ititm.il crDSS-t: imo iMmiptii'-i sdiis , National AKsossmont 
clat.i ;iri' bKc-oml ii.^ r.vuilablc with ilata i.ii sc«x, race, and ocoiiomie status 
(.IS !iu«asut'i'il. by parcntMl cdiu'at' inn) . 

(•i) ComiiU'tt' Stat(«-by-Statc; and scMiuol district data arc not 
now available, nor are the cxistinj^ data liomparabl o boeause ol" the 
variety of" tests !j;iven and the raiif^e ol" j^rades at which tests are 
adiTU n i s tci'cd . 

STANDAKIil/.INC I'OK Si:X, lNC()^n:, AND RACi: 
Why SIK Ad justment V 

Tlie ili i l'erences Jn test scores by socioeconomic status and race 
are easv ei;ou;j;h to disiilay wlien the amount of information is limited. 

« ...... ^ 

The chart (»n page 24 shows mathematics test scores in grade level equivalents for 
pui.ils in the Oth, 'Jth, and 12th }j;rades and the movements of test 
scores lor whites and blacks in hij^h, mcdiiun, and low socioeconomic 
status (of iiarents ) . ' 

It would be ditlicult indeed to show on the same chart or table 
test scores in ;j;rade level equivalei\ts by race and socioeconomic status 
ol' parents over time. because of information overload the information 
n.ust be siumiiar i /ed so that differences can be seen more readily. 
Stnndardi/.ation lor population i^. a t amJ liar way of displaying; infor- 
mation coiiiiKirabl y across jurisdictions or across time. 

Star '* unl i7,at]<in techniques are well known, primarily in demo- 
<j;!MI)hic studies cii ( criKM' lar,^ely with birth and death rates, but also 
ill other ii-cas such as labor force- participation. Similar statistical 
mi- 1 hods could and should be applied to achievement scores ajid other 
measures ol' educational outcomes on competence. 



Chart 



Test Scores by 
Grade Level 
Equivdient 



Mathematics Achievement Scores by Grade Level 
Equivalent for Pupils in Grades 6, 9, and 1 2 by Race 
and by Socioeconomic Status of Parent 

y White High SES 

/ 

/ ^ White Medium SES 



/ 



/ 



/ 



/ 



10 7 



/ 

/ / 




White Low SES 



Negro High SES 

Negro Medium SES 
Negro Low SES 



* Grade Level EquiVcjIent > 14.0 



Grade in School 



12 



Source: Bjvrl on iliit.i in Gf'otqn W Miiv'^skr;, et .il., "Growth in Achipvomcnf for Different Riirial, Regional, and 
Socio Ecotiomic Grouping of StudentT,," U S OHice of Education, May 1969 (processed). 



ERIC 



A sitij^li! si't ol iiHtiDiiaL norms lor nil !J,irl.s and boys, all 
iiu'omo .',r.)ui)s, niitl all r.icinl -roups wouLcl be cii)i)roi>t'i ^^tf only if one 
nssiunes Lh..t I -ri)ui)s : (I ) have tlu; .-mu! iiitorosts, (2) are ex- 
posed tc tlu. same loarninj^ oxi)or iciu'o , {.^) havc- the same opportunity 
to loarn. and (U) b 'vo tin; same vorbal oompotonoios . 

It- is trcquontly maintained tliat an atyi)ical pupil or an 
atyi)ical i^roup ol pup Us (atyi)ical in terms oi" oducational oppor- 
tunities) (-anru.t be "fairly" judjrod by a test v^/hich assumes equal 
educntinnal back'^rounds. Many standardized tests, for example, do 
not differentiate norms by sex; ^ivls, however, tend to score higher 
on tests that are verbal in nature while boys tend to score higher on 
tests that are numerical f)r mechanical. 

Suppose we say that reading for understanding with some com- 
pete.ice is a basic skill that all clii Idren should master. There still 
remains a set of facts that would lead us to I'orrect scores of achieve 
ment tests for the special characteristics of the school's i>opulation. 
The vocal.uiary of one group may differ from th.t oi another; tfie same 
word in fact may convey very different meanings-- a "fJy," a' "strike," 
ri 'i.at." If si-hool districts or schor)Ls are being compared--in one, 
(.hiUlrcn are from hoines with highlv develoi)ed formal llnglish language 
skills, in tlu^ nUicr, fonual i:njj;lish is not a first language and there 
Is i.itt-ic family part i c-J!)ation in child education— comparisons would 
|,ardly be useful lor asscssin- progress in student language skill 
achievements. liven if test instniments were free of verbal biases, 
it might be desirable to have separate reierence norms for girls and 
lur bovs. for blacks and lor whites, for virU and for poor. 



ERIC 



- 2S - 



We are not dispiitin,^ the f»r5.^iimniit that minimal requirements of 

basis skills shouUl be npplioablo to all. As tlie President indicated 

in his 1970 rnessajre to roniL^ress: 

Tor years the fear of ''national standards" has been one of 
the bugaboos of education. There has iiever been any serious 
effort to impose natTonal standards on educational programs, 
and if wc act wisely in tliis j^eneration we can be rensonabJy 
fHinfider t that no sucli effort will arise in future senera- 
tions. The problem is that in opposing some mythical threat 
oi' "national standards" wliat we have too often been doing is 
avoiding accountability for our own local performance. We 
have, r'is a iiatioiu too long avoided tliinking of the pro duc- 
ts vi t:\' of sc[ioo]s 

This is a mistake because it undermines the principle of 
lotN-^l control of education. Ironic though it is, the 
avoidance of accountability is the single most serious threat 
to a contiiaied, arid even more pluralistic educational system. 
Tnless tlie local community can obtain dependable measures 
of just Iiow well its school system is performing for its 
children, the demand for national standards will become even 
greater and in the end almost certainly will prevail. When 
local officials do not respond to a real local need, the 
search begii^s for a level of officialdom that will do so, and 
all too often in the past tliis search has ended in Washington . fU) 

Study is under wa\' on the problems of comparing school districts 
aiid States on tlie basis of test perf ormani.'e . Test score interpretation 
based on differential norms is useful in comparing school districts with 
comparable cliaracterist i cs . But other methods are necessary when com- 
paririg s\stems witli different charac^terist ics . 

Ke^ry S. Dyer \\nE proposed a method of computing a Sc-hool 
i;f f ect i veness Index (SKI) that ''aut:omati(»a 1 1 v ad justs for the dif- 



fering c i r'nimsta: (x^s in wiucli a sclinol must operate.'' His educational 
a(MM)unting system has n procedure tor establishing Sf.I profiles for a 
school. T[ie procedure calls for a series ol' regression analyses, 
MSin'!; the test S('orcs d b.-ick^iiroiif^d characteristics from nil scliools 
in the area within whicli comparisons are to be made. Measures of 



ERIC 



quality are then doteriniiied b\ the distnnce of a school from the 

(20) 

regression line. Uyer IdentiTies hard-to-chaiige as contrasted 

with easv-to-ohany^e variables (or circ'iimstaiu'es in which schools 

must 'operate) , controll i 115; statistically variables over which. 

schools have little or no control, he points out, a sc:hool and its 

staff are better able to determine how effective their effortrs may 

be. SCl's ma\' indicate ways in which a school staff might improve 

its perrormanoe. ' • 

To Illustrate tbe problem furtiier, we drew on 1970-71 data 

from Arizona showing reading test results by county (table 1) in a test given 

(21) 

to third f.»;raders . Averaj^e grade equivalents are below the third- 

grade level in all counties except Yavapai, with a range of 2.bXo 
3.0. Assuming there is a significant difference between 2.6 and 3.0, 
is it a result of differences in educational quality or is -it the 
result of differences in population chara(*teristi cs in \'avapai on 



tlie one hand and Apoche o\\ the other? 
Table 1. Rending Test Results, by County 



(^ouiity Avera^jje Grade Equivalent 

Apoche 2 . () 

Cochise 2.9 

Coconino 2.8 

(.ila 2.9 

c.raham 2.8 

rireenlec 2.0 

Maricopa 2.9 

Mo I 3ve 2 . 9 

\Hvaio 2.8 

Pimn 2.S 

l^inal ^ 2.7 

Sa-^ta Cruz ' 2 . f» 

'iP\apni ^ ^-0 

uimn 2.8 



^SIR" AdUistmont Metliods 

In oompariDg any two scliool districts. States, otlier population 
}j;roups, or the same population ^roup or (community at various points of 
time, control lor differences in race, sex, and income distribution 
permits a more realistic picture of educational qutcomes . Standard- 
ized outcome. indexes are ^neaningful only for comparison. But the 
adjusted scores, when used in con|u^>ction witli unadjusted ones, con- 
tribute ^rt^ently to an understanding of the data for comparing educa- 
tional achievements in different places, or at different times. 

Tfie most common methodology In population standardization is 
to assume a standard set of demographic characteristics for all areas, 
or ail dates being studied. Specific rates or achievement scores for 
each of the subpopulations in each area or at each time period j^e 
then applied to the standard population. This calculation would 
show, for example, achievement scores which would have been experl- 



ERIC 



enced if the sex, income, ^Wd race characteristics of the school 
district had beefi the same as those in a standard population* Since 
the standard population is applied to all communities or time periods 
being studied, differences ii^^^^ce, sex, and income composition are 
removed or are held (.constant in iiiakiiig the comparisons. The specific 
mechanics for romputing standardized rates can vary. Two general 
methods are out li tied iiere : Community average speci flc achievement 
scores weighted by a staridard population and An index of educational 
ac-hievement in which the specific ac^hi evement scores for a standard 
population nrc ('(jmpdrcd with the standard sc^)r(^s for n standard 
Ml 'V nt joy I . 



2H - 



Commuiutv .Mvcrfiec spt'ciiic achievement sco res weighted by p standard 
population . Tor example, assume a standard population characteristic 
I:: a St-fttc with the roll()Win<', racial composition: white, S5 percent; 
bl.ick, Ih percent: other (includiii}^ American Indian), 20 percent. The 
sex distribution is somewhat weighted in favor of females at SI percent, 
fncome is shown in three classes only (a classification that seems 
too. uridif lerentiated, but represents existing practices within a 
State) The ir -ome classes show 30 percent of tlie population with 
incomes under $3,000; ('0 percent with incomes between $3,000 and 
S10,000; and 10 percent with incomes $10,000 and over. 

Tibic 2 (sec page 30) illustrates the components of n standard 
population by sex. Income, and race, for this particular illustration, 
three classes of race are indicated and three income groups: 

"Slir' adjusted achievement test scores in school districts In 
the State ran be ' calculated by multiplying average specific scores for 
eacli of tlie IH subpopulation groups of categories (composed of three 
race categories, the" two sexes, and three income classes) by the 
corresponding standardized population distribution for each sub- 
population. In School District 1 the white population Is shown as a 
larger perc entage of the total that) in the standard population, and 
tlie black a substantially smaller percentage. By the same token, a 
smaller proportion of tlie population is in the under-$3,000 income 
class witli an average score of 2.(>, and more than double tlie standard 
population is In the $10. 000-and-over income class with an average 
score of S.l. Scliool District 2's chnract^ristics are assumed to come 
closer to those of the standard for the State as a whole. 



lERiC 



- 2') - 



Talilf J. Ilypullii'l i,M I "SIR" Chat-ac Lor i.s L i.i!s of .1 SLaLe and 2 
SciiooJ I) is L riots 

(III percent) 



Standard I'opuLation Characteristics in State 



White - SS 
liliick - l() 
Uther - :."J 



I OU 



Male 
Female 



tt9 



100 



Under $3,000 

$:s,ooo-$io,ooo 

SI 0,000 + 



- 30 

- 00 

-uiQ- 

100 



Characteristics of School District 1 



White - 80 
ULack - 08 
Other - I^' 



Male - 
female - 51 



Under $S,000 - 17 
.SS,000-.SLO,000 - G2 
$10,000 + - 21 



100 



io(i 



100 



C:harac tef i st ics of School District 2 



White - ()0 
Black - 10 
Other - -10 



Male - 50 
female - 50 



Under S:5,000 - 35 
000 -$10, 000 - 55 
- 10 



$10,000 + 



100 



100 



100 



f 



ERIC 



There is fio ff^jjlit w ' )vv ol' popiilntlon chRrnntGri sties to be 
iisrrl ii: standard I zHt i on . I'lio cnnibinfitions ol spe(*il ir rr?tes depends 
basically on the niinibeT* ol' vu^oui) i i;.<.;s or nl ass i t'i cat: i ons oonsidered 
asf_»riil .i'or ad "inst i ny; * for sex, innomo, and race. I'or^ ex rumple, average 
v;rade ecruixalent in formation for Arizona di f ferenrti ated five rp'cial 
}i;roiips, stiowinjLi; avera^o ^rado (equivalents for third-year reading , \ 
tests for each y;roiip . In other States the Tiidian population, for 
examplr, or those with Spanisli sun ames, ma\ i ot a(*count for so 
lart!;^ a portion of the population as to warraiit separate classifica- 
tion . 

hioorno is estimated in the State of Arizona by teachers. The 
data are compiled in the State on tlie basis of teacher information 
for tliree broad income group injrs : smaller spreads and thug more 
c^lasses of inc^ome mij:;}it be desirable in sliowins achievement score 
differeiM^es lor standardized populations The difficulties of 
j^reater precision in income reporting relyinjj; on teacher observations 
are s!;reat' if not i nsurmount.ibl e . | 

III nil, spttu? 30 combinations of S] R-speci lie scores might be ^a 
reasonable nuinl^r of nombl/iat ions that would permit achievement scores 
to be shown as SIH-spccific rates aiid for^whir^h averages could be com- 
putcd and rewcightcd for comparative purposes. This number would con- 
sist of three racial subgroups, five income clrisscs and two sex groups 

Tliere is no right of selecting a comparative standard for 

f 

demographic* groups acr(^ss^i ther rime or co.mrn»;ni ties . The standard 
depends primaril\' or the uiiits sub|ec»t to comparison. If, for 
example, school districts within^ State are compared, the standard 
(*ould' be the average race, income, sex (composit ion iov the State as 



\ 



H 
O 



c 



0) 
4J 

(U 
Q 

(U 
fH 

> 
•H 

g. 

U 

c 

d 
o 

o 
a. 



^4 
O 

V 
o 



• 4J 

c; 

03 (U no n3 



o 4-» o 



• 4-> 

> 05 



O 4-» 

o 

X -P 



o 



u 

•H 



> 

O 



U Q) 
CD CD 'T^ 'H 



O P O 
O OJ 



H AJ rr 



in o 



00 O fH 



00 ID 



o r-- o 

• 0 • 

f\j f\j r*" 



00 O 00 
• • • 

fH c\i m 



rr m H 



CTJ lO m 
• • ■ 

fvj no ::^ 



zr f\J fH 
• • • 

fM m zr 



m f\J no 

fVJ 



no rH m 
CM 



o 
o 

o «^ . 
o o 

m </> 
CO- I o 

o o 
j:^ o o 
q; o «^ 



o 
o 

o ^ 

o «^ 

o o 

no </> 
</> I o 

o o 

(DO ^ 

^ ^cr 

:d ^/h co- 



rn m lh 
• • • 

f\j m z3- 



fH fH lO 

• 0 • 

OJ m ^ 



m f\J rH 



m fVJ fH 



LH zf O 

0 • • 

c\j m lo 



f\J VD 00 
• • • 

f\j f\j m 







H 




U 




M 

oi 


(U 


H 


H 


cn 


03 


M 


E 


Q 


(U 




(H 


u:] 




o 




o 








u 




CO 





CM 



H 

03 
2 



to 



0 • • 

m zj- lo 



O O iH 

• 0 0 

m zJ- lo 



a> oc 

OJ 



m cn 



o 

o 
o o 
o 

o o 

«^fH + 
</> I O 

o o 
o o 

(U o «^ 

p </)-</> 



o 

o 
o o 
o 

o o 

«^iH + 

no </V 
-CO- I O 

go 
o 

cu o 

C: m fH 

:3 <n'</> 



OJ 
bO 
03 

u 

03 

•H 

4j 

03 
•P 

OJ 
•H 



ERIC 



- 32 - 



a whole (wJ-.ich is tho process assumed in tfie exampl os sTiowiO , or Piiy 
one of the school districts could be used ns n staiidard asiunsf which 
otlier school districts would be contrasted, or a national standard 
could be applied. Standardization of demographic characLc.^-**- tics 
for intertemporal comparisons permits the use of tlie standard popu- 
lation in a base year, the latest year, or some intervening year. 

Grade equivalent scores for each of the 18 subpopulation groups 
of the standard and District 1 populations are shown in tnbl^ 3 
(page . In this table of hypothetical third-grade reading' scores, 
families with incomes of $10,000 and over in the standard population 
have grade scores averaging 4.3 for the whites, U.O for the blacks, 
and 3.6 for other races. For the under-$3,000 income group in the 
standard population, th? scores averaged 2.9 for wnites, 2.0 for 
blacks, and 1.9 for other races.. In the standard used here--the 
statewide average — the scores appear lower than those in School 
District 1. 

The unadjusted averages or raw scores for reading tests at 
grade -three -equivalent levels are summarized in table U for the two 
-«t?li6ol districts. Thus, the statewide standard score is shown to 
average 2.G; the average for School" District 1, U.l; and for School 
District 2, 2.9. 

Table U. Cnad justed or "Crude" (>irade Equivalent Levels for 
Reading Tests, Grade 3 

Statewide Standard Score Average 2.0 
School District 1 Average U.l 
School District 2 Average 2.9 

for each of the components of STH, tlie kinds of differences 

drawing on the Arizona county data are illustrated in table 5 (page 34). 

- '3 3 - 



Table S. SIR Di f'fcrcMices in Arizona 

Average grade Equivalents differ statewide as follows 
By rare ; 



Wliite. I 

Spanish slirnamed / 
Black \ / 2.7 

Indiai] \ / 2.()\ 
Oriental U.l 



3.7 
2.8 



Ry sex : \ 

Male 3.3 
Female 3 . () 

By income : 



Below ^3,000 2 . (> 

$3,000 - $10,000 3.4 
Above $10,000 4.2 

A lu'potlieti cal school district. District 1, which on raw scores 
averages 4.1 for reading tests at grade three levels, lias a reduced 
score of 3 . () when corrected lor population differences; District 1 has 
a larger proportion of wliites and/or liigher income groups than is 
"standard" for the State. It has a higher unadjusted score than would 
ir: fact be attributed to it if it had a standard population distribution 

Standardizing for population differences thus changes the unad- 
-^justed or raw score average for School District 1 from 4.1 to 3.0. If 
the population distribution by sex, income, and race in School District 
1 had been, the same as the. stat(»wid(» average (if the proportio/i of the 
high income class were lower and the proportion of whiti\s were somewhat 
lower), tlie average would be lowered i'rom the raw avei^age shown. 

An index of educational acMiievement in whi cfi the specii ir^ achievement 
scQres weijj^hted for a standard population arp (compared with the stand - 
ard ac^hievernent S(*ores tor a staiidard population . This i ndex asks the 

- ^4 - 



quustiuii: Is tiic ncliiovenieiit score in tlie District (or State) hi^lier 
tiian the ''avGra^o" or not, and by what porctMitaHe? lor time-period 
data tlio indox would ask: Is the ac?hievement score higher or lower 
in one year. than some base period? 

This measure, standardized lor population, would call for the 
computation of an nveraji;e adjusted score. This score would be the 
sum at specific S{,'ores weijj,hted by the population standardized for 
sex, income, and race by grade level divided by the standard scores 
lor tlie standard population (multiplied by 100). 

Adjusted achievement scores by color, sex, and income are com- 

puted by usinjL;, i»i tlie numerator, average scores specific for color, 

sex. and income multiplied in each instance by the distribution of the 

standardized population for tliose cliaracteristics . 

Sum of the products of the SIR 

specific rates multiplied by 

^ , stan dard populat i on distribution 

Au lUSted score = r ^\ 1 — Z — r"Tl Z ^ 100 

' Sum uf the product of the norm 

or standard scores times stand- 
ard population distribution 

When avera!j;e crude 'ores for a State or scliool district' are 
available but sc^ores for eacli sub;Lj;roup are not, some indirect methods 
of adiustmeiit become necessary. The two. variants of the earlier 
methods are presented here 

Variant mctliod I — Scores adjusted for standard populations when 
spec ific rates are nut Hvailable for each time period or school dis - 
trict for which comparisons are made . Average crude scores could be 
ad lusted o\ the basis of the ratio oT the st.indard scores for a 
stfindard populntion to tlie standard scores weighted by the specific 



cliHrartorist- i cs of the school population. What iiGods to bo known is 
the population oliaracter istJ c oi the State or school district but no^ 
neeessarilx' the specific scores for each identified subpopulation . 

The crude nvoraj^e score in this process is multiplied by a 
ratio. The numerator in that ratio represents the sex, income, and 
race standard rntes^ weij^hted by the composition of the standard popu- 
lation distribuVion (i,, other words, the wei^iited .iveray,e score for 
the standard) / The denonii n -^tor of the ratio represents the average 
scores nbtailied at srandard scores for sex. income, and racial groups 
weighted by the subpopul ation distribution of the particular State 
(school district or time period) . 

Variant method 2-- Index of specific achievement sc ores compared 
to the standard or averaj^res for the State or school district . The 
vjariant in this instance, as in Variant method 1, is useful when 
average crude scores for f State or school district are available 
but scores for each subgroi p are not . 

The crude average score for a school district would be divided 
by standard scores weighted by population characteristic of, the school 
district and multiplied by 100 to derive the index. The computation 
essentially shows the crude achievement score as a ratio of crude 
score to the average standard scores for the same specific ponulatioti . 

Tlie two gctieral methods of standardizing for sex. income, and 
race differences fre summarized below, ,->long witli the variants that 
cau^be used wlien these metliods of standardizing Rre used to \ield 
specific sots oi' luuiibers ; 

1. Average of specific scores weighted by Standard Popu- 
l at ion . 



erJc 



2. Specific Scores weijjhted by Standard Population Distribution ^ 
Standard Scores wei^bted by Standard Population Distribution 



Variant I. 

J ^« ^ Stand ard Score x Standard Population 

Average crude scores x ^tan^B^d .Scores x School District Population 

Variant 2. ' ■ 

Specific Scores weighted by "School District Population 

(Averap:e on Crude Score 1 ^ -^qq 

Standard Scores weighted by School District Population 

Tlio I orniul as result in these numbers for the School District 1 
examp ' n . 

Method 1 yields a SIR adjusted rate of 3.6 

Method 2 yields an ad justed score index of 138 or (3.6 x 100) . 

2.6 

The variant of method 3v yields 3.0 or U.l x 2.0 . 

3.0 

The variant of method 2 yields 137 or UJL, x 100. 

3.0 



Stdted differently, the several indexes for School District 1 might 

/ ' ' 

be summarized in this way . for the third grade reading score: 

. . . If there were a standard population and it had 
standard scores, the average would have been 2.6. 

. If school district 1 population characteristics are 
taken itito account, but scores are standard, the average •* 
score becomes 3.0. 

... If the school district hid a standard population and 
scores equal to its own experience, the average becomes 3.6. 

If the sc>iool district is assessed in terms of its own 
scores and its own school district population, the average 
is U.l. 

Standardization processes can be varied further, depending upon 
the kinds of comparisons soui^ht . And the comparison can be made In 
terms oi' index numbers to otiipliasize the comparative nature of the 
estintato . 

FRir 



\ov r(»rtniM nnalysos, it is clearly desirable to apply more 

intra oate Torms of statidardi zatioii ; proc'e>diirGS outlined above involve 

only the application of a statidnrd set of demojj!;rapliic variables to 

various achievement rates. Such standardization, tlierefore, is 

basically a form of weighting, and tlie standardized rate is a 
weighted arithmetic mean. 

More than averages cire probably needed to understand variations 

in educational achievement within subgroups, and it might be desirable 

at some later date to consider more complex adjustments. 

Toward Data deflection on Outcomes 

A ma lor NCfS role in educational outcomes data requires prep- 
aration for tlie collectiarr of data on achieveme scores and socio- 
economic status of children. If educational outcomes are to be linked 
to program inputs and program financing, tlien more complete nation- 
wide data must be obtained. The machinery for such collection has to 
be built and, once designed, a strategy for implementation put into 
practice. 

The flementary and Secondary f.ducation General Information 
Survey O^L-^EGIS), conducted by NCKS, after review and revision, may 
ultimately be the instrument for coliectioit of data on acliievement 
score outcomes. CLSEGIS includes a survey of expenditures and reve- 
nues of f/f.A^s by sourt^e and account; a nationally representative sam- 
ple of districts is surveyed. Carlier the Belmont Survey- of the 

In 1908 the Council of Chief State School Officers and the U. S. 
Office of education undertook to ioirtly develop and implement a 
comprehensive educ^ational evaluation system. The initial meetings 
took place at Fielmont House in F.lkridge, Maryland, and the program 
has been known as the Belmont project. More recently the Committee 
on evaluation and Information Systems of the Council of Chief State 
School Officers has, in accord with one of its purposes, begun the 
formulation of recormendations on information required for evalua- 
tion as part of a State-local information system. (22) 

- 38 - 



ERIC 



nureau of r.lemoiitnry and Secondary rduoatlon undertook to collect data 
on Finances and pros;ram evaluation materials. Some collection processes 
should bo dovcloped so that comprehensive data can become available for 
policy formulation not only at the national level but in the States and 
communities as well . 

A March 1970 report of the Committee on Educational Finance 

Statistics to the II. S. Commissioner of Education paid special attention 

(23) 

to the need for comprehensive comparative data for policy decisions. 
\\niile noting that the Committee had not given much attention to student 
achievemetirs and attainment of other program objectives, the report 
noted the need for relating expenditures to educational impact and 
proposed data collection on educational impact as follows: 

1. Number of pupils below "minimum achievement standard" 
(such as the fourth stanine) in reading and math per 
average daily attendance at various grade levels; e.g., 
3. G. and 9. (Where statewide achievement testing 
results are available, as in New York, Alabama, Rhode 
Island, Michigan, Minnesota, Pennsylvania, and Calif ornifl^X-. 

2. Number of pupils below "minimum achievement standard" in 
reading and math per number of title I eligibles at vari- 
ous grade levels. (The Committee's recommendations in 
this area--educational impact--only illustrate types of 
data that, if available, would be desirable.) 

lor each of these common denominators, comparisons should be 
made between data from local schools, school districts, and 
State data on the same item. 

Tlie Committee, however, did not take accpunt of the need for 
ad lusting test scores for populatioti differences. 

Tentative findings 

1. At present a body of readily available data on achievement 
scores and "SIK" is not sufficient to yield State-by-State estimates 



- 39 - 




or to reloti* i>utcuin(?s and inputs by State; selected school district 
data for lar^c districts nrc more nearly nvailabJo. 

2. Extensive achievemei t testiiiji^ is golnjj; on in the Nation ^s 
largest c^ities. One of six national standfirdi zed tests Is being used 
in the lovver grades of those cities. 

3. None of the present metliods of achievement testing and ^ 
standard! zlnjj; norms provides the data necessary to compare perform- 
a!ice levels of scliools or scliool systems with different demographic 
charai*ter i sties . 

Tfie Anchor test work provides the first mechanism for 
translatlnjj; from one test score to another among the major reading 
comprehension tests. 

Various methods need to be developed to assure that SIR 
data and, in particular, appropriate income data become available to 
match achievement testing. The Internal Revenue data appear poten- 
tially th'emost^ ^ body of income Ini'ormation . 

0. Achie/e;nent clearly is only one among several ma jor 
educational outcomes. Other measures of competence sliould be actively 
pursued. The measures discussed in PSL^s report on educational out- 
coiies demonstrate that achievement in designated sub'iect matter is but 
a part of the develooment of iivtellectual competence. 

* Public Services Laboratory of Georj;etown University 



- UO - 



Ri-i-i:Rr.Nci:s 

1. •'i:iiieajj;o's Pup Li s CoL I'oor TusL Scores." Cliica^o Today, 

June IG, l')71. 

2. First National Conterence on Testing in Education and employment. 
Oral Discussion, Hampton Institute. Hampton, Virginia. 

April 1-3, 1973. 

3. Coleman, James S et_al . Lcniality of Educational Opportunity. 
Washington, D.C. : Government Printing Office, 1966. 

IK Messages of the President to the Congress. March 3, 1970. 
Message on Education Reform , March 3, 1970. 

5. Budget Message of the President. The Budget of the United States 
Government Fiscal Year 1974 . Washington, D.C: Government 
Printing Office, 1973. 

6. Dyer, Henry S., and Rosenthal, Elsa. An Overvi ew of the Survey 
Findings in State Educational Assessment Programs . Princeton, 
N. J.: Educational Testing Service, 1971. 

7. l-evine, Robert A. The Poor Ye Need Not Have With Thee: Lessons 
[roiii the VVar on I'overty . Cambridge: The MIT Press, 1970, p. 144. 

8. Eevine, Donald M., ed. Performance Contracting in Educati on - An 
Appraisal . I'.nglewood Cliffs, Nevv? Jersey: Educational Technology 
I'uhl ioa lions, I97i^. 

9. U.S. Department of Health, Education, and Welfare. Toward a 
Soeial Report . Wasliington, D.C: Government Printing Office, 
I9(.9. p. (if). 

M). N.iLLoiial Ccals Research Stafl. T(3ward Balanced Growth: Quantity 
and Quality. Washington, D.C: Government Printing Office, 1970. 



11. Tuiiat.'il l, D.'inu"!. "Working Outlines for 0MB Social liitlicntors 

Publication," May 1973. (l)r\published manuscript) 

\ 

12. Kdiirntionnl Tostliu/ SorvitH>.\ Stat(> Tostin^j; TroMirnrns: A Slirvoy of 

rutictions. Tests, Mntcrials, ni\d Services . Princeton , N .J . : 

\ 

Educational Testing Service, EvaV^ation and Advisory Service, 
March 1968. \ 

13. Akron Public Schools. ''Basic Testin^^ Programs Used in Major School 
Systems Throughout the United States . " /M<:ron, Ohio: Akron Public 
Schools, April 1968. 

m. Unpublished, informal survey by the Research Council on Greater 
City Schools, 1970. 

15. Burns, Oscar Krisen, ed. The 7th Mentcrl. Measurements Yearbook , 
Hishlnnd Park, N.J.: Gryphon Press, 1972. 

IG. Quoted by Richard M. Jaeger in "A National Test Equating Study in 
Reading'' (processed, undated). 

17., Mushkin, S. J. ''National Assessment and Social Indicators." 

Monograph, Washington, D.C. : Government Printing Office, forth- 
coming . 

18. Pro]ect TALENT. "A National Inventory of Aptitudes and Abilities." 
Bulletin No. 1, November 1959. University oi* Pittsburgh, !'roject 
Talent ()ffi(n>, Washington, D.C. 

19- National Center for Health Stiitistios. School Acliievement of 
Children G-11 Years as Measured by the Reading and Arit h metic 
Subtests of the Wide Range Achievement Test . Vita' health Sta- 
tistics. PUS Pub. No. 1000 - Series 11 - No. 1 0-1 . Washington, 
D.C: Government l^rinting Oflice, June 1970. 



ERLC 



- H2 - 



20. Dyer, Hoiiry S. "Toward l)b icctivc Criterin of Professional Account- 
nbility in the Schools of Now York City." Phi Delta Kappa, Vol. 

No. U, Dec. l'J7l), in). 2Ui>'2\.\ . 

21. Arizona u/partmont of Ilducation. 1970-71 Third Grade Readin g 
Achievement Test Results Rrport . Phoenix: Arizona Department of 
education, June 1971. 

22. Committee on Evaluation and Information Systems. "Bylaws of the 
Committing on evaluation and Information Systems (CFJS) of the 
Council of Chief State School Officers." Washington, D,C,, 

September 15, 1972. 

23. Kelly, James A. "Report of the Committee for Educational Finance 
Statistics; Reconunendations for Data Col'lection, Analysis and 
Publication." New York: Columbia University, Teachers College, 
March 1970. 

24. Public Services Laboratory. "Educational Outcomes: Afi Exploratory 
Review of Concepts and Their Policy Application." Washington, D.C. 
Public Services Laboratory, April 1972. 



ERIC 



- 'n - 



4 



APPLNDIXES 



APPCNDIX 1: KI:C()MM1:NI)AT[0NS of an ad hoc COMMITTCi: ON mi:asurement 
Jik EDUCATIONAL COMPF.TLNCE 

An ad hoc conuni.ttee was established by the Public Services 
Laboratory of , Georsetown University July 1971 to: (1) review the con- 
cept of an adjusted educational achievement index, and (2) explore 
possibilities of applying a standardization of-population -for «ex, 
income, race, and age (STRA) differences to data collected on achieve- 
ment test scores in comparing achievement scores among jurisdictions 
and across time. The committee met July 29 and September 25^ 1971, with 
staff members of the National Center for Educational Statistics (NCES) 
and the Public Services Laboratory (PSL) of Georgetown University. / 

Members of the Ad Hoc Committee on Educational Achievement 
Measurement were: 

Office of Education Outside Consultants including PSL Staff 
Dorothy Gilford Alfred Carlson, Educational Testing Service 

Boyd Ladd William Coffman, Iowa Testing Programs 

Ezra Claser H. Russell Cort, General Learning Corporation 

Richard Berry Burton R. Fisher, University of Wisconsin 

William Dorfman Virginia Herman, PSL, Georgetown University 

Selma J. Muslikin, PSL, Georgetown University 
Nelson Noggle, Science Research Associates 

At the First meeting, committee members considered a pre- 
liminary statement of the need for a "SIR" adjustment, looked at 
technical and administrative problems, and discussed i„ethods of adjust- 
ing for differences in population! characteristics. The meeting ended 
with a tentative list of recommended steps. 
O - 1+7 - 



The? stUMjiKl meeting focused on the role of NCES in do^a collection 
and on needs lor educational outcome measures. It discussed in greater 
detail methods of adjusting achievement and other competence scores, 
finally, the ciimmittee a.^reed to recommend next steps tliat would: 
(a) produce a body of knowledge on wliat achievement tests are given to 

/ 

what children/by wliat .jurisdictions, (b) provide information on how 

achievement test data are being used, and (c) seek to provide measures . 

on outcome (through a Slll/V-type of adjustment or otherwise) that would 

reduce [)ossible misinterpretation. 

The committee at its September 1971 meeting recommended: 

1. That a survey be made on the extent of data on achievement 

scores . Study of State testing programs to update and elaborate the 

1967 ETS compila/fion was proposed to' determine what tests are being 

;iven to wliat children, where, and on what forms the data are reported. 

lay be* feasible to tie a survey such as this into the '^Longitudinal 

Study of a Representative Sample of the High School Class of 1972" now 

being conducted by N(lfciST^ In the intervening period ETS has completed 

( (6) 
new survey of State uses of educational achievement tests. 




2. That the data problems in SIRA adjustments be reviewed and ' 
analyzed . A proposed study of the information sources on SIRA by State 
and school districtX^w^uid include the types of data, definitions, fre- 
quency of reporting, and application as a SIRA adjustment. Special 
attention should be given to the time lag between the need for decision- 
making or policy-making data and its actual availability. 

•i. That a small-scale ^.test of SII^ ^be conducted[ . A study, in 
one or more pilot Slates, of SIRA adjusted achievement scores would 
ascertain: (a) the problems in gathering, reporting, and interpre^ng 

such scores, and (b) use^of adjusted scores by States and districts. 

r ^ 



i|. That ourrent uses of educational achievement scores a s 
^ a)batistical materials be determined . Included in thLs study would be 
an in-depth examination ol the current uses o£ achievement ?"re data 
in each State. The study would build or the review suggested in Kecom- 
mendation 1 and would detennine the purposes of States and communities 
in using achievement test scores, for cxaL.ple interjurisdictional fund 
allocations, payment incentives and budgetary decisions within a govern- 
ment. 

Dr. 11. Russell Cort had reservation? about making adjustments 
in achLevement test data for evr/iuation. "For research purposes, it^s 
certainly necessary and desirable^/^. . to be able to adjust groups for 
, the purposes of comparison." But for fjurposes of direct practical 
decision -making, "the adjusting of test scores to take account of 
variations in population may , in fact, lead to either expectations or 
conclusions that are not desirable." Dr. Burton R. Fisher shared these 
reservations, although he believed that in selected instances SIRA 
adjustments may aid in decisions. , 

Dr. Cort concurred with Recommendations 1 and 2 but had reser- 
vations about 3 and '4. He said: 



The pilot project would be a very desirable' thing although 
I find it difficult to reconcile even the notion of tjjie 
pilot project with a value conviction on my part that once 
the Pandora^s Box of adjustments is opened, the conseguences 
of adjusted data are apt ^o be destructive or deleterious and 
uicimateiy beyond, the co-;-i'ol of the data provider. However, 
as we 'discussed the approf h on September , l'J71, it vas 
agreed that, hopefully, a State that wonld be involved in 
trying out the u^e of adjusted data would have agreed to make 
use of it and hopefully would have agreed to explore in come 
detail the implications and actual effects oi' providing or 
pub.ishing adjusted comparative data .miong school systems. 



ERIC 



- '19 - 



APPCNDIX 2\ CllARACTLRISTICS OF SELIXTLIJ STATC-WIDL TESTING PROGRAMS* 

A i-^jnited number of State education agencies were asked by the 
NCLS in Spring 19 73 for infonnation on achievement test data repre- 
sentative of individual school districts. 

Nine states were contacted: California, Florida, Iowa, Massa- 
chusetts, Michigan, Mississippi, New York, Texas, and Virginia. The 
findings reported for the nine states are summarized below: 

California : Collects test data annually on the universp 'of 
students in grades 1, 2, 3, 6 and 12. The Cooperative Primary Reading 
Test is used at grades 1, 2 and 3. The California Test of Basic Skills 
is used at grade 6. The Iowa Ter.:: of Educational Development is us'ed 
at grade 12. Distribution of scores by percentile by district could 
be made available. 

Florida : Collects test data annually on the universe of 9th 
and 12th grade students (for purposes of high school ' program selection 
and for scholarship eligibility). In 1972, collected reading test 
data on a State-wide sample of 2nd and 4th grade students. Inferences 
can be made about districts, but the data are not technically district 
representative, ^ 

Iowa : Does not have an official statewide testing program. 
It is estunated that U25 of the State's 440 districts (over 90%) 
voluntarily test students on an annual basis using ITBS and ITED. 
The State department does not have data. Release would require per- 
mission of each individual school district. 

* Bayed on a June 14, 19 7 i, memorandum to Selma Muslikin frnm Kathy 
Wallman. 

J** Iowa Tests of Basic Skills and Iowa Tests of Educational Development, 
respectively. 

- 50 - 



Massachusetts : Has no statewide testing program. Two years 
ago all 4th grade students were tested in n sample of 57 school systems, 
using the California Test of Basic Skills. 

Michigah : Collects test data annually on the universe of 
students in grades 4 and 7. The Michigan Educational Assessment Bat- 
tery, including Reading and Mathematics portions, is used. Deciles 
are available for all districts. 

• Mississippi : Has a voluntary statewide testing program at 
grades 5 and 8. Approximately 120 of 150 districts, or 85% of total 
enrollment, have participated.' For those districts which have par- 
ticipated, It would be possible to obtain percentile distributions 
by district. 

New York : Collects test data annually on the universe of 

* ,j 

students In grades 3 and 6. The New York State Reading and Mathe- 
matics tests are used. Stanines are available by district. 

Texas ; Has no statewide testing program. 

Virginia: Collects test data annually on a universe of 
4th, 0th, 9tli and 11th grade students. SRA is used at the elementary 
level; STLI' is used at the secondary level. Mean scores are available 
(published) for each district. Data are not stored in automated form; 
an investigator would have to use individual tests (hard copy) to 
analyze data further (e.g., to obtain percentile distributions by 
distric ty . 

Citations have been received by NCES suggesting that data on 
State testing programs may be available in the following additional 
States: Arizona, Delaware, Georgia, New Jersey, Nevada, and Wisconsin. 



- 51 - 



APPENDIX 3: BASIC W STING PROGRAMS IN MAJOR SCHOOL SYSTEMS 



uiiy or county 


ITEO' 


ITBS^ 


STAT'^ 


IbAT^ 


STEP-' 


SRA^ 


MAT^ 


Akron 8 
Albuquerque 
Ann Arundel. Mil 
Atlantd 
Baltimore City 


9,11 


.i.4.b.(>.H 


5.7 
5.6 


5,7 


2.3.4.6.7.8 




4.5.6.7 
3 


Baltimore County 
Birminghjm 
Boston 
Brevard. FM 
Broward. Fla. 




3.6.8 


4.5.6.7 

1.2.3.4.5.G.7 
8 9 10 11 12 


8.11 

2.3.4.5.6 


11 






Buffalo 

Caddo Parish. La 
U'arlotte Mecklentnirg 
Chicayo 
CirKinnati 


9 
11 




4.6.8 
3.6.8.10 


4.6.8.12 






3.6.8 


Clark. Nev 

Ctevrljnd 

Co1uml)us 

Dallas 

Dayton 






2 


4,6.10,12 
3.5.6.8 








DeKalb. Ga 

Denver 

Detroit 

East Baton Rouye. La 
El Paso 


10 

9 


4.6.8 
8 

7 < 


1.2.3.4.5.6.7.10 


8 
5 


10.12 




4.5.6 

ft 


Fairfax 
Flint 

Fort Worth 
Grand Rapids 
Greenville ^ 


10 


3.4.5.6 
3.4 


8 

1.2.3.4.5.6 
4.6 






4 

6.8 




Honolulu 

1 

Houston 
Indianapolis 
Jacksonville 
Jeffersofi, Aid 


9 


3.4.5.6 


6.8 

1.2.3.4.5.6 


8,1 1 


5.7.9.10.11.12 




4 


Jefferson. Col 
Jeff'^rson. Ky 
Jersey City 
Kanawha 
Kansas City 


10 


3.5.8 
4.0.6.8 


1.2.3.4.8.10 
1.3,6 










Long Be.ich 
Los Anqfl«»s 
L A County 
Louisville 
Memphis 


9 
1? 




•> 


8 

0.7.8 






1 ■ 

1.2f3.4,5.6. 
7.8. lb. if 

1.2.3.4.5.6,7,8,10 


Miami 
Milwaukee 
Minneapolis 
Mobile 

Montgomery. Md 


10 


1.6,8 
6.8 

5.7 


.''.3.4.0.6 








7.8 



ERIC 



52 




City or County L - yg p 



ITBS' 



STAT 



Se lected testing instruments *)v grade level 

STEP^ 



CAT^ 



SRA^ 



MAT' 



iNashviile Odvidson 

lewiirk 

tew Orleanb 
[New York 

lorfotk 



10 



Oakland 
iklahoma City 

grange Co.. Fla 

^alm Beach 

'hiladetphid 

?hoenix 
iPineMas. Fid 
IPittsburgh 
IPortland 



IPortland (Metro) 

Prince Georges, Md. 
lProvidf3r)ce 
I Richmond 

[Rochester 

ISt. Louis 
|St. Paul 

San Antonio 
ISan Diego 
[San Francisco 
ISan Jose 
I Seat tie 
I Syracuse 

Tampa 

Toledo 



9.11 



11 

9.11 

11 

10 



11 



[Tucson 
I Tulsa 

(Washington 
I Wichita 
I Worcester 
I Youngstown 



9.11 

10,11,12 



4.6 



12 



3.4.5.6.7 



3.4.5.6.7.8 



5.6.7.8 



4,6.7 



3.4.5.6 7.8 



2 

4.5.6,7.8 
4.6.7 



4.5.6,7.8 
4,5,6,7,8 
3,5,6 
6 



3,4.5,6,7.8 



3.4,6.8 



34.5.6 



3,4,5,6 
8 

3,6,7,8,9 



3,4,5,7,8 



3.6.8 



5,6.7.8 



8.10 



7,8 



7.11 



2,3,4,5,6,7,8,10,11 
3.6 
3,5 



3.4 
1.2 

9 



4.5,6,7.8 
1.2.3,4,5.6,7.8 



4.6 
3 



1.3 



4.6.8.10 
2.3.4,5.6.7 



3.4.6 
2 



1 Iowa Tests of Educational Oevelt«pmpra 
^lowa Tests of Bdsic Skills 
^Stanford Achieverrv^nt Tests 
4California Achievement Tests 
^Sequential Tests of Edu.idtion Progress 
^Scierice Hp . Nirch 'Associates Achievemer<t Series 
^MetiopoHian Achievement Tests 

5|n somf* of the school systems lis'ed. other tests are uswJ 

/ 

Source Akron Pubhc Schools. "Basic Testing Programs Used in Major Schoo^5ys1ems Throughout the United States " Akron. 
Ohio: Akron Putilic Schools. April 1968 



ERIC ^ 



53 



